Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

The optical character recognition of Urdu-like cursive scripts

Identifieur interne : 000119 ( Main/Exploration ); précédent : 000118; suivant : 000120

The optical character recognition of Urdu-like cursive scripts

Auteurs : Saeeda Naz [Pakistan] ; Khizar Hayat [Pakistan, Oman] ; MUHAMMAD IMRAN RAZZAK [Arabie saoudite] ; MUHAMMAD WAQAS ANWAR [Pakistan] ; Sajjad A. Madani [Pakistan] ; Samee U. Khan [États-Unis]

Source :

RBID : Pascal:14-0080822

Descripteurs français

English descriptors

Abstract

We survey the optical character recognition (OCR) literature with reference to the Urdu-like cursive scripts. In particular, the Urdu, Pushto, and Sindhi languages are discussed, with the emphasis being on the Nasta'liq and Naskh scripts. Before detaining the OCR works, the peculiarities of the Urdu-like scripts are outlined, which are followed by the presentation of the available text image databases. For the sake of clarity, the various attempts are grouped into three parts, namely: (a) printed, (b) handwritten, and (c) online character recognition. Within each part, the works are analyzed par rapport a typical OCR pipeline with an emphasis on the preprocessing, segmentation, feature extraction, classification, and recognition.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">The optical character recognition of Urdu-like cursive scripts</title>
<author>
<name sortKey="Naz, Saeeda" sort="Naz, Saeeda" uniqKey="Naz S" first="Saeeda" last="Naz">Saeeda Naz</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hayat, Khizar" sort="Hayat, Khizar" uniqKey="Hayat K" first="Khizar" last="Hayat">Khizar Hayat</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>University of Nizwa, Sultanate of Oman</s1>
<s3>OMN</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Oman</country>
<wicri:noRegion>University of Nizwa, Sultanate of Oman</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Muhammad Imran Razzak" sort="Muhammad Imran Razzak" uniqKey="Muhammad Imran Razzak" last="Muhammad Imran Razzak">MUHAMMAD IMRAN RAZZAK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>King Saud bin Abdulaziz University for Health Sciences</s1>
<s2>Riyadh</s2>
<s3>SAU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Arabie saoudite</country>
<wicri:noRegion>King Saud bin Abdulaziz University for Health Sciences</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Muhammad Waqas Anwar" sort="Muhammad Waqas Anwar" uniqKey="Muhammad Waqas Anwar" last="Muhammad Waqas Anwar">MUHAMMAD WAQAS ANWAR</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Madani, Sajjad A" sort="Madani, Sajjad A" uniqKey="Madani S" first="Sajjad A." last="Madani">Sajjad A. Madani</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Khan, Samee U" sort="Khan, Samee U" uniqKey="Khan S" first="Samee U." last="Khan">Samee U. Khan</name>
<affiliation wicri:level="2">
<inist:fA14 i1="03">
<s1>North Dakota State University</s1>
<s2>Fargo, ND 58108-6050</s2>
<s3>USA</s3>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Dakota du Nord</region>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">14-0080822</idno>
<date when="2014">2014</date>
<idno type="stanalyst">PASCAL 14-0080822 INIST</idno>
<idno type="RBID">Pascal:14-0080822</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000025</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000739</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000002</idno>
<idno type="wicri:doubleKey">0031-3203:2014:Naz S:the:optical:character</idno>
<idno type="wicri:Area/Main/Merge">000120</idno>
<idno type="wicri:Area/Main/Curation">000119</idno>
<idno type="wicri:Area/Main/Exploration">000119</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">The optical character recognition of Urdu-like cursive scripts</title>
<author>
<name sortKey="Naz, Saeeda" sort="Naz, Saeeda" uniqKey="Naz S" first="Saeeda" last="Naz">Saeeda Naz</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Hayat, Khizar" sort="Hayat, Khizar" uniqKey="Hayat K" first="Khizar" last="Hayat">Khizar Hayat</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<inist:fA14 i1="04">
<s1>University of Nizwa, Sultanate of Oman</s1>
<s3>OMN</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Oman</country>
<wicri:noRegion>University of Nizwa, Sultanate of Oman</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Muhammad Imran Razzak" sort="Muhammad Imran Razzak" uniqKey="Muhammad Imran Razzak" last="Muhammad Imran Razzak">MUHAMMAD IMRAN RAZZAK</name>
<affiliation wicri:level="1">
<inist:fA14 i1="02">
<s1>King Saud bin Abdulaziz University for Health Sciences</s1>
<s2>Riyadh</s2>
<s3>SAU</s3>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Arabie saoudite</country>
<wicri:noRegion>King Saud bin Abdulaziz University for Health Sciences</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Muhammad Waqas Anwar" sort="Muhammad Waqas Anwar" uniqKey="Muhammad Waqas Anwar" last="Muhammad Waqas Anwar">MUHAMMAD WAQAS ANWAR</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Madani, Sajjad A" sort="Madani, Sajjad A" uniqKey="Madani S" first="Sajjad A." last="Madani">Sajjad A. Madani</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>COMSATS Institute of Information Technology</s1>
<s2>Abbottabad</s2>
<s3>PAK</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>4 aut.</sZ>
<sZ>5 aut.</sZ>
</inist:fA14>
<country>Pakistan</country>
<wicri:noRegion>COMSATS Institute of Information Technology</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Khan, Samee U" sort="Khan, Samee U" uniqKey="Khan S" first="Samee U." last="Khan">Samee U. Khan</name>
<affiliation wicri:level="2">
<inist:fA14 i1="03">
<s1>North Dakota State University</s1>
<s2>Fargo, ND 58108-6050</s2>
<s3>USA</s3>
<sZ>6 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<placeName>
<region type="state">Dakota du Nord</region>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
<imprint>
<date when="2014">2014</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">Pattern recognition</title>
<title level="j" type="abbreviated">Pattern recogn.</title>
<idno type="ISSN">0031-3203</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Character recognition</term>
<term>Feature extraction</term>
<term>Image databank</term>
<term>Manuscript character</term>
<term>On line processing</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Segmentation</term>
<term>Signal processing</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Reconnaissance optique caractère</term>
<term>Banque image</term>
<term>Caractère manuscrit</term>
<term>Traitement en ligne</term>
<term>Reconnaissance caractère</term>
<term>Segmentation</term>
<term>Extraction caractéristique</term>
<term>Reconnaissance forme</term>
<term>Traitement signal</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">We survey the optical character recognition (OCR) literature with reference to the Urdu-like cursive scripts. In particular, the Urdu, Pushto, and Sindhi languages are discussed, with the emphasis being on the Nasta'liq and Naskh scripts. Before detaining the OCR works, the peculiarities of the Urdu-like scripts are outlined, which are followed by the presentation of the available text image databases. For the sake of clarity, the various attempts are grouped into three parts, namely: (a) printed, (b) handwritten, and (c) online character recognition. Within each part, the works are analyzed par rapport a typical OCR pipeline with an emphasis on the preprocessing, segmentation, feature extraction, classification, and recognition.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Arabie saoudite</li>
<li>Oman</li>
<li>Pakistan</li>
<li>États-Unis</li>
</country>
<region>
<li>Dakota du Nord</li>
</region>
</list>
<tree>
<country name="Pakistan">
<noRegion>
<name sortKey="Naz, Saeeda" sort="Naz, Saeeda" uniqKey="Naz S" first="Saeeda" last="Naz">Saeeda Naz</name>
</noRegion>
<name sortKey="Hayat, Khizar" sort="Hayat, Khizar" uniqKey="Hayat K" first="Khizar" last="Hayat">Khizar Hayat</name>
<name sortKey="Madani, Sajjad A" sort="Madani, Sajjad A" uniqKey="Madani S" first="Sajjad A." last="Madani">Sajjad A. Madani</name>
<name sortKey="Muhammad Waqas Anwar" sort="Muhammad Waqas Anwar" uniqKey="Muhammad Waqas Anwar" last="Muhammad Waqas Anwar">MUHAMMAD WAQAS ANWAR</name>
</country>
<country name="Oman">
<noRegion>
<name sortKey="Hayat, Khizar" sort="Hayat, Khizar" uniqKey="Hayat K" first="Khizar" last="Hayat">Khizar Hayat</name>
</noRegion>
</country>
<country name="Arabie saoudite">
<noRegion>
<name sortKey="Muhammad Imran Razzak" sort="Muhammad Imran Razzak" uniqKey="Muhammad Imran Razzak" last="Muhammad Imran Razzak">MUHAMMAD IMRAN RAZZAK</name>
</noRegion>
</country>
<country name="États-Unis">
<region name="Dakota du Nord">
<name sortKey="Khan, Samee U" sort="Khan, Samee U" uniqKey="Khan S" first="Samee U." last="Khan">Samee U. Khan</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000119 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000119 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:14-0080822
   |texte=   The optical character recognition of Urdu-like cursive scripts
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024